Efficient and exact maximum likelihood quantisation of genomic features using dynamic programming

نویسندگان

  • Mingzhou Song
  • Robert M. Haralick
  • Stéphane Boissinot
چکیده

An efficient and exact dynamic programming algorithm is introduced to quantise a continuous random variable into a discrete random variable that maximises the likelihood of the quantised probability distribution for the original continuous random variable. Quantisation is often useful before statistical analysis and modelling of large discrete network models from observations of multiple continuous random variables. The quantisation algorithm is applied to genomic features including the recombination rate distribution across the chromosomes and the non-coding transposable element LINE-1 in the human genome. The association pattern is studied between the recombination rate, obtained by quantisation at genomic locations around LINE-1 elements, and the length groups of LINE-1 elements, also obtained by quantisation on LINE-1 length. The exact and density-preserving quantisation approach provides an alternative superior to the inexact and distance-based univariate iterative k-means clustering algorithm for discretisation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

In this paper, for the first time, the subject of change point estimation has been utilized in the stationary state of auto regressive moving average (ARMA) (1, 1). In the monitoring phase, in case the features of the question pursue a time series, i.e., ARMA(1,1), on the basis of the maximum likelihood technique, an approach will be developed for the estimation of the stationary state’s change...

متن کامل

Bearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm

Rotating machinery is the most common machinery in industry. The root of the faults in rotating machinery is often faulty rolling element bearings. This paper presents a technique using optimized artificial neural network by the Bees Algorithm for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (maximum likelihood estima...

متن کامل

Dynamic programming and the graphical representation of error-correcting codes

Graphical representations of codes facilitate the design of computationally efficient decoding algorithms. This is an example of a general connection between dependency graphs, as arise in the representations of Markov random fields, and the dynamic programming principle. We concentrate on two computational tasks: finding the maximum-likelihood codeword and finding its posterior probability, gi...

متن کامل

PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data

Peak detection is a central problem in genomic data analysis, and current algorithms for this task are unsupervised and mostly effective for a single data type and pattern (e.g. H3K4me3 data with a sharp peak pattern). We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm: constrained dynamic programming. We investig...

متن کامل

Solving Single Machine Sequencing to Minimize Maximum Lateness Problem Using Mixed Integer Programming

Despite existing various integer programming for sequencing problems, there is not enoughinformation about practical values of the models. This paper considers the problem of minimizing maximumlateness with release dates and presents four different mixed integer programming (MIP) models to solve thisproblem. These models have been formulated for the classical single machine problem, namely sequ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • International journal of data mining and bioinformatics

دوره 4 2  شماره 

صفحات  -

تاریخ انتشار 2010